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ABSTRACT 

The Dutch Identity is a useful way to reexpress the basic equations of item 
response theory (IRT) that relate the manifest probabilities to the item 
response functions (IRFs) and the latent trait distribution. The identity may 
be exploited in ssveral ways. For example; (a) to show how IRT models behave 
for laige numbers of items — they are submodels of second-order log-linear 
models for 2^ tables; (b) to suggest new ways to assess the dimensionality of 
IRT models — factor analysis of matrices composed of second-order interactions 
from log-linear models; (c) to give insight into the structure of latent class 
models; and (d) to illuminate the problem of identifying the IRFs and the latent 
trait distribution from sample data. 
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There are few mathematical tools that have proved useful in the study of 
the structure of item response theory (IRT) models. This is especially true for 
the so-called "marginal maximum likelihood" approach in which the distribution 
of the latent variable is integrated out and the would-be analyser is left 
facing an intractable integral that must be evaluated numerically (Bock and 
Lieberman, 1970). While the EM algorithm (Bock and Aitken, 1981) can be u.^ed to 
simplify this integration, this fact is mainly useful in computing maximum like- 
lihood estimates and does not lead to any insight into rhe structure of the 
mode Is t hems e 1 ve s . 

The purpose of this paper is to introduce a new tool that does, in some 
crises, make the integrals disappear and allows the structure of the model to 
appear in useful new ways. The remainder of this paper is organii:ed as follows. 
Section 1 sets up the notation; section 2 states and proves the new result — 
the Dutch Identity. Section 3 illustrates its value in several problems and 
section 4 contains additional discussion. 
1. NOTATION 

The notation follows that in Holland (1981), Cressie and Holland (1983), 
and Holland and Rosenbaura (1986). We let C denote a population of examinees and 
T a specific test. The zero-one variable xj denotes correct or incorrect on 
item j in T and the response vector, x, is given by: 

X = (xi, xj). 
Let the proportion of examinees in the population, C, who would produce 
response vector x when tested with T be denoted p(x). Clearly, we have 

p(x) > 0 and Y p(x) = 1. 

X 



The 2 values, p(x), are called the manifest probabilities . Let X be the 
response vector of a randomly selected examinee from C on test . The probabi- 
lity function for X, Prob[X=x] , is just p(x), i.e. 

Prob{X=x] = p(x). 

Item response models restrict the form of the manifest probabilities, p(x), in 
the following way. First of all, the value of a latent variable, 6, is assumed 
to be associated with each examinee in C such that 

a) given 8, the coordinates of X are independent, i.e. 

P(X=x|e) = n P(X;=x;|e), and 
j 

b) the item response functions, P(Xj=l|e) = Pj(6)j are 
usually restricted in some way, e.g., to be monotone 
increasing in 6 or to have a specified functional form 
such as the one-, two-, or three-parameter logistic 
form (Birnbaum, 1968), and 

c) the distribution function of 6 over C is denoted 
by F(P). 

Since xj is dichotomous, we may write 

?(Xj=xj|9) = Pj(e)''j Qj(e)^~''j, 

where Qj(0) = 1-Pj(6). The conditional independence assumption may then be 
written as 

p(x=x|e) = n p.(e)''j q (e)^"''j. 
j 

But, by the usual rules for manipulating conditional probabilities, we have 

P(X=x) = / P(X=x|e) dF(e), 
and consequentxy all (locally independent) iusm response models may be viewed as 
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restricting p(x) to have the form 

p(x) = / n Pj(6)''J Qj(6)^^''j dF(6). (1) 
j 

Equation (1) relates the manifest probabilities, {p(x)], to the latent it^m 
response iunctions, and the distribution of the latent variable, F(6). 

In this paper, as in the discussion of the marginal maximum likelihood approach, 
(1) is taken to be the defining characteristic of any IRT model. The integral 
in (1) is the "intractable integral" referred to earlier and is often an 
obstacle to the further understanding of IRT models. 

The inanifest probabilities, {p(x)], are the governing quantities in the 
likelihood function that arises when data are collected by randomly sampling N 
examinees from C and testing them with T. In this situation let 
n(x) = number of examinees in the sample producing 
response vector x. 

Then, if C is large compared to N, {n(x)] follows a multinomial distribution 
with parameters N and {p(x)]. The likelihood function is the multinomial proba- 
bility function (except for a multiplicative constant) and given by 

n p(x) 

X 

Thus the log-likelihood function is 

L = 1 n(x) log p(x). (2) 

X 

Hence, it is natural to study the structure of log p(x), i.e., the "log-manifest 
probabilities," and I shall do just that. 

In this context, a model f p(x) is simply a restriction on the form of 
p(x) in (2) and, in particular, IRT models are formed by equation (1) and 
possible restrictions on the [Pj(6)] and F(6) . 



Cressie and Holland (1983) studied the structure of the models defined by 
(1) and were successful in completely characterizing p(x) in the case of the 
Rasch model — the case where the IRFs have the form specified by 

P;(0) 

logCpJjoy) = a(6-bj). (3) 

In (3), a = common discrimination parameter, and bj is the item difficulty 
parameter. In this paper, I will generalize a formula obtained by Cressie and 
Holland that re-expresses (1) in a useful way. This generalization is the Dutch 
Identity. 

2. THE DUTCH IDENTITY 

Theorem 1 gives the basic result of this paper. 
T heorem 1 : (The Dutch Identity) If p(x) satisfies (1) then for any zero-on e 
vector y 

= E(exp[l(Xj-yj)Xi]|X=y) (4) 
where Xj = Xi(6) is the item logit function , 

Pi(e) 

Before going through the easy proof cf (4) , let me make a few comments 
about it. First of all, in (4), x is thought of a;> varying over all possible 
response vectors while y is thought of as a fixed response vector. In a sense y 
is an arbitrary choice of "origin." In using this identity we may choose y to 



I discovered this identity and some of its consequences while lecturing in the 

Netherlands during October, 1986. Since this discovery was in no small part 

due to the stimulating psychometric atmosphere in Holland, I decided to call 
the result the Dutch Identity. 
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have desirable properties. The right-hand-side of (4) is a conditional expec- 
tation of a certain function that involves, X(e) = (Xj^(6), Xj(e)), given 
that X=y. More specifically, it is the posterior moment-generating-f unction of 
^ - X(6) given X=y, evaluated at the point, x-y. 
Proof : We may express (1) as 



X- 

p(x) = / n (q^) J n Qj(e) dF(e) 



= / exp(i(x; x.(e)} n Q;(e) drre) 



P ■ ( 6 1 

= / exp{i(xj-yj) Xj(e^} "(q^)^-" " dF(e) 



so 



p(x) = / exp{i(xj-yj) Xj(0)} R PjCe)^: QjCe)^"^: dF(e) 
j j 



and 



£g| = / exp{l(xj-yj) Xj(e)}[j dFCe)] 

However, the quantity in brackets is the posterior distribution function of 
given X=y, i.e. , 

j ^^7) = dF(9lX=y) 



Thus 



^ = / exp{l(xj-yj) Xj^B)} dF(9|X=y) 



= E(exp{l(xj-yj)Xj}|X=y) QED. 
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This proof follows the type of argument used by Cressie (1982) to prove a 
similar type of identity that is useful in empirical Bayes applications. To my 
knowledge, the Dutch Identity has never been used in the analysis of IRT models, 
although Cressie and Holland (1983) derived the special case of (4) in which 
y=0. Finally J it should be mentioned that in (4) the fact that 6 is a scalar is 
not used and in fact 6 might be a vector, 6. 
3. SOME APPLICATIONS OF THE DUTCH IDENTITY 
3.1 An IRT Model That Is A Second-order Log-linear Model 

An IRT model for p(x) involves an integral, but log-linear models for p(x) 
are much simpler and merely state th£:,t log p(x) is linear in some parameters, 
i.e. 

log p(x) = a + b(x) P (5) 
where & is a (column) vector of free parameters of length K, b(x) is a (row) 
vector of K known constants, and a is the normalizing constant that insures that 
the p(x) sum to 1. Log-linear models for p(x) correspond to log-linear models 
for 2*^-contingency tables. These are widely used (e.g.. Bishop, Fienberg, and 
Holland, 1975). Some examples ?re as follows Throughout the rest of this 
paper, t denotes vector or matrix transpose. 

a) Independence . The coordinates of X = (X2,...jXj) are independent if and 
only if 

log p(x) = a + 5] Pj^j- (6) 
j 

In this case b(x) = (x]^,...,xj) and, 3^ = (32j---j3j)- 

b) Generalized Rasch Model . In Cressie and Holland (1983) the following model 

is discussed in detail 

log p(x; = a + 5; lijXj + y Tk 6(k,x+). (7) 
j k 
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where 6(k,x+) = 



"l if x+ = k 

and x+ = X X j . 
0 otherwise, J 



If b(x) = (xi,...,xj, 5(1, V+), ... 6(J,x+)) 

and 3^ = (&!,... 'fit * * * t'fj) then (7) defines the class 

extended Rasch models. If the are restricted by the inequrlities 

indicated by Cressie and Holland (1983), then (7) defines the class of 
Rasch models. 

c) Second-order Exponential Models . Tsao (1967) defines a second-order 

exponential (SOE) model by 

log p(x) = a + 5; PjXj + I r^-s Xj-Xg. (8) 
j r<s 

In this case b(x) = (xi,...,xj, xiX2 1X1X3, ... ,xj-ixj) , 

and 3^ = (3i,...,&j, T^i2j '^13 * • • • >'^J-1 , J) • 

Ail interesting question is whether or not an IRT model satisfying (1) can 
ever be equivalent to a SOE model. This section shows that from the Dutch 
Identity one may construct an IRT model that is a submodel of the class of SOE 
models. The next section shows that this construction is far more general than 
it might first appear. I will state the results as a corollary to Theorem 1 in 
which 9 is a column vector. 

Corollary 1 : xf , foi some choice of y, the poster ior d istribution of 8|X=y is 
normal , i.e. 

F(e|X=y) is Njj(|iy, Xy). 
and if the item logit functions ^j(fl) are Hnear, i.e. 

Xj(e) = Xj(yy) + aj(8-iiy) 
where aj = (aj j , . . . .apj) and D is the dimensionality of 9 then 

leg p(x) = a + (x-y)t X(yy) + i(x-y)* A A^Cx-y) (9) 
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where = (a^,...,ap, and a = log p(y). 

I first prove this result using Theorem 1 and then I will comment on it. 
Proof : From (4) we have 

log p(x) = a + log E(e(x""y)^^ | X=y) , (10) 
where a = log p(y). But by assumption 6|X=y is N(yy, Yy) y since X(6) is a 
linear function of 6, the posterior distribution of X is also multivariate nor- 
mal. Hence, we hs /e 

E(X|X=y) = X(yy) 

and (11) 

Cov(X|X=y) = A Xy A^. 
Now remember that the expected value in (10) is the moment generating function 
(mgf) of X evaluated at (x-y) . However, the mgf of a normal variable Z with 
mean p and covariance evaluated at s is 

E[exp{stZ]] = exp{st y + i j; s]. (12) 
Applying (12) to (11) and (10) with s = x-y and taking logs yields (9). QED. 

To see that (9) is, in fact, of the form (8), expand the terms in (9) and 
collect them to form 

log p(x) = {a + i yt B y - yt X(yy)] + xt(X(yy) - B y] + i x^ B x, (13) 
where B = A J]^ A^. Now suppose B = T + where T has a zero diagonal, b is the 
diagonal of L and is the diagonal matrix based on b. Thus (9) is equivalent 
to 

log p(x) = {a + i yt B y - yt X(yy)] + x^{\i]iy) - B y + b] + i x^ F x, (14) 
z 

since = x^. 

If we now make the substitution 

a' = a + i y^ B y - X(yy) 
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3 = X(py) - B y + b, 
we see that (9) is equivalent to 

log p(x) = a' + P + i r X, (15) 
which is just a matrix way of expressing (8). 

The fact that an IRT model ran exist that is a non-trivial example of a SOE 
model (i.e., is not independent) is quite interesting in its own right. Lord 
(1962) showed that second-order linear (as opposed to log-linear) models do not 
give reasonable score distributions in general. This would not be true of the 
model specif i'sd in (9) or (15). 
3.2 IRT Models Vith Large Numbers of Items 

It might be thought that the example given in Corollary 1 is unusual but 
the purpose of this section is to show that it holds as a limiting form for all 
"smooth" unidimensional IRT models. 

When the number of items, J, is large, 6 is a scalar, F has a density, and 
y is a "typical" response vector, then the posterior distribution of 6 given X=y 
is approximately normal, i,e, 

dF(elx=y) = 0(^) (16) 

where 0(x) is the unit normal density function. Furthermore if the item logit 
functions, Xj(e), are diff erentiable they have the expansion 

Xj(e) = Xj(yy) + (^) (e-py) + o(|e-py|). (17) 

Finally, if ay is small, as it will be for large enough J, the higher order 
terms in (17) can be ignored and we have X approximately multivariate normal 

with mean vector, X(py), and covariance matrix, (^l^) ^y(|^)^. ^ matrix of 
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rank 1. Hence, because of Corollary 1, in this situation the following equation 
will hold approximately (as J->«>) for any xmidimensional IRT model for which the 
IRFs are smooth, and F is continuous: 

log p(x) = a + (x-y)t X(py) + i (x-y)^ (|^)(|^)^ (x-y) , (18) 

Equation (18) defines a submodel of the class of SOE models in (8) in which the 
second-order parameters are restricted to a multiplicative form. In terms of 
the free parameters that can be independently estimated, (18) is of the 
following log-quadratic form: 

log p(x) = a + (x^ &) + (xt r)2. (19) 
Equation (19) does not define a log-linear model but rather a submodel of the 
cl^iss of SOE models that has only 2J-parameters rather than the full set of 
J + parameters of the general SOE model. 

The derivation cf (18) depends only on the fact that (a) 6 is one- 
dimensional, (b) F is continuous, (c) J is large, (d) y is chosen so that 
F(G|X=y) is approximately normal with a small variance ay, and (e) Xj(6) is dif- 
farentiable. Since all models in use usually assume (a), (b) , and (e) and 
since the existence of y satisfying (d) is well-known among users of BILOG (see, 
for example. Bock and Mislevy, 1982), the representation of p(x) as a model of 
the form (18) is a very general result » The only issue is how large J needs to 
be for it to hold. This is a worthy topic for future research. 

One implication of (18) is that there can be at most two parameters per 
item consistently estimated for long tests. This is in accord with the general 
fact that it is difficult to estimate three or more item parameters in an 
unrestricted fashion for data sets that involve many examinees and many items, 
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even though IRFs are often parameterized with more than two parameters. 
3.3 The Study of Test Dimensionality 

If 8 is a vector parameter, 8, and has an approximate normal posterior 
distribution F(8|X=y) for some y, with mean |iy and covariance matrix Zy then the 
generalization of (18) is 

log p(x) = a + (x-y)tX(yy) + i(x-y)t(|^) Xy(|^)t(x-y) . (20) 

Letting 

3j = Xjtyy) and R = (|^) lyi^f 

we have 

log p(x) = a + (x-y)t 3 + Kx-y)^ R(x-y). (21) 
Equation (21) says that p(x) satisfies an SOE model in which the matrix of 
second-order interactions is proportional to R. However, the rank of R is the 
rank of Zy which is the same as the dimensionality of the latent variable 8. 
Hence, equation (21) suggests that a way to factor-analyze dichotomous items is 
to fit a SOE model to the 2*^ table, {n(x)|, and then to factor-analyze the 
matrix of second-order interactions, R. This method will be especially 
appropriate when there are large numbers of items. It does not make any assump- 
tion other than those made in section 3.2. 

The matrix of second-order interactions in a SOE (or log-linear) model is 
only a triangular array with no meaningful diagonal. Hence "factor analysis" of 
such data is not easily interpretable in terms of covariance matrices and linear 
regressions of items on factor scores. Instead, all I mean by factor analysis 
is the decomposition of the elements Tj.£ in (3) into the following terms 

r^s = f^^^ fi^^^ + rp^^ + ... rp^^ for l<r<s<J, and D«J. (22) 



12 

The vectors t^^^ = (T^^\ '^^h^ must also satisfy orthogonality 

constraints. The lengths of these vectors must also decrease, 
^ ^ ... . 

In a more general vein, I am tempted to propose that a test measures D 
dimensions in population C if representation (1) holds for its manifest probabi- 
lities in population C with 6 = (6i,...,ej)) and if there is a response vector y 
such that F(6|X=y) is more concentrated about its center in every direction 
than F(6) is. If F(S) and F(6|X=y) both possess covariance matrices, 1 and Ey, 
then this condition could be expressed as 

X Xy > 0 ^23) 
in the sense that this difference can be positive definite. This proposal is 
based on the idea that if the test really measures all of the coordinates of 9 
then, for at least one response vector, y, our knowledge of 6 ought to be more 
precise in every B-direction if the response y is observed than if the test is 
not given, in which case all that is available is the unconditional distribution 
of 8. 

3.4 Latent Class Models 

The simplest latent class model has just two latent classes, which we can 
label by two real numbers Qi and 62. Then equation (1) reduces to 

p(x) = I {n P;(ei)''j Qj(ei)^"''j]pi (24) 

i j ^ 

where Pi + P2 = 1 ^he proportions of examinees in C with 6i and 62, respec- 
tively. 

This latent class model violates the assumption that F is continuous in the 
strongest possible way, i.e., F is a two-point distribution. However, the Dutch 
Identity, (4), is sti] 1 valid for this case. The posterior distribution F(e|X=y) 
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is also a two-point distribution concentrated cn and 62 with 

Pi(y) = P(e=ei|x=y) 

and P2(y) = P(8=e2|X=y). 

Hence the moment generating function in (4) is given by 

E(exp{X(xj-yj)Xj] |X=y) = pi(y) exp{X(xj-yj) Xji] + p2(y) exp{I(xj-yj) Xj2} 
j ' j j 

where = Xj(6]^), Xj2 = Xj(e2). Applying the Dutch Identity yields 

= pi(y) expfl(xj-yj)Xji} [l + exp{l(xj-yj) ( Xj2-Xj 1) ] ] . 

Let 5j = Xj2 - Xj]^ and 3j = then taking logs we have 

log p(x) = a + Z(xj-y^3j 



+ logd . e^ " (25) 



\aere a = log(p(y)pi(y)) and T = log(p2(y)/Pi (y) ) . 

Let LP(x) be the "logistic potential" function, i.e. 

LP(x) = logd + e""). 
(Note that the derivative of LP(x) is the logistic function, hence the name 
"logistic potential.") 

We may express (25) as 

log p(x) = a + (x-y)tp + LP(r + (x-^)t6). (26) 

Thus, the Dutch Identity reveals that the two-^class latent class model for 
dichotomous data is a log-nonlinear model of a very special form, (26). 
Different choices of y can affect a and Y in (26) but not 3j and 6j. This 
representation of the two-point latent class model may yield alternative ways of 
fitting such models, and approximations to LP(X) may also prove useful. 
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3.5 Vhat Does An Observed Response Vector Tell Us About The Value Of A Latent 
Variable ? 

The estimation of 6 in IRT models is problematic. The LOGIST program, 
Wingersky (1983), produces "maximum likelihood" estimates of 6, 8, while the 
approach used in BILOG, Mislevy and Bock (1982), produces posterior expectations 
of 6 given each possible response vector, y, i.e., E(6|X=y). However, in my 
opinion, it ha.s always been a mystery as to exactly what these quantities really 
mean since 

a) the scale of 6 is arbitrary, 

b) for some choices of F, E(6|X=y) will not exist, 

c) the "likelihood function" used in LOGIST to compute 
6*s is not the real likelihood function for many 
applications — e.g., when examinees are sampled from 
a well-defined population the likelihood function in 
(2) is the correct one. 

The Dutch Identity provides a key to rnderstanding this mystery. The 
equation 

^ = E(e(''-y)'V=y) (27) 

may be re-expressed in the following way. Let r = x-y and let 
Sy = [r : y-i-r = x = a 0/1 vector] . 

Thus Sy is the set of all (0,1 ,-l)-vectors r such that when added to y we 
get a (0,l)-vector , x, back. Clearly, Sy depends on y. Now (4) can be written 
as 

E(e^^^|X=y) = all reSy. (28) 

2u 
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Hence (28) says that for each fixed value of y, the moment generating func- 
tion for the conditional distribution of X given that X=y evaluated at each reSy 
equals the ratio p(y+r)/p(y). Since the manifest probabilities are, in prin- 
ciple, the most that the data can ever determine, equation (28) implies that for 
each y, the values of E(e |X=y) for reSy are the most that we can know about 
6. Suppose we let 

gr(X) = e'^^^ , (2y) 

then (28) says that 

E(gr(X)lX=y) =Iiigfl (3,^ 

for all reSy. Thus, (30) is an example of the so-called generalized moment 
problem. We are interested in the conditional distribution of X given X=y. 
Equation (30) says that all we can know, in principle, are the values of the 
expectation of gr(X) for all reSy for this conditional distribution, Kemperman 
(1968) shows how knowledge of these generalized moments can be used to infer 
knowledge of the conditional distribution of X given X=y. These inferences con- 
sist of bounds on probabilities of the form 

Ly(S) < P(r6SlX=y) < Uy(S), (31) 
where S is a set of X"-values. 

Hence the mystery of what can be "estimated" about 6 is resolved into 
bounds on probabilities of events that involve X(6) rrther than 6. The tools 
developed by Kemperman (1968) and others can be usee? to investigate these issues 
further. The central role of X(G) in (30) suggests that model building ought to 
be in terms of X^iQ) rather than P4(6). 
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3,6 The Rasch Mode l 

The Rasch model has a one-parameter logistic item response function which 
implies that the logit function, Xj(6), has the following linear form; 

Xj(e) = a(6 - (32) 
In addition, the ability distribution, F(6) , is unspecified in the Rasch model. 
In Cressie and Holland (1983) it is shown that for ths Rasch model the manifest 
probabilities, {p(x)|, have the following log-linear form: 



log p(x) = a + X X; B; + X 6(x+, k) (33) 
j -J -J k 



where x+ = Z xj 



and 



6(x+, k) = - 



1 if x+ = k 
^ otherwise . 

The parameters {Pj] are unconstrained and each may vary over (-<», ») . The 
are the logarithms of a moment sequence, i.e. 

7ic = log[E(U^], 

for an arbitrary positive random variable U. Thus, the are subject to a 
system of inequalities described in detail in Cressie and Holland (1983) and In 
de Leeuw and Verhelst (1986) . 

The main tool used by Cressie and Holland to establish (33) is a form of 
the Dutch Identity with y=0. We may obtain an alternative formulation of (33) 
using the general Dutch Identity. This is given in Theorem 2, 
Theorem 2 : If p(x) satisfies an IRT model with one-parameter logistic IKFs 
(: .e. , (32)) and genera l F, then p(x) satisfies the log-linear model 

log p(x) = a + ^(Xj-yj)&j + 6(x+, k), (34) 

where the vary over (-«, ») and the have the form 

= log[(E(U^"y+)] 



erIc 
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for k=0, 1, J , and U is an arbitrary positive random variable ♦ 

Proof : From the Dutch Identity we have 

P(x) = p(y) K[exp{Z(xj"-yj)a(6"-bj)}|X=y] 

= P(y) exp{l(-abj)(Xj"-yj)} E[exp{ae(x+-y+) } IX=y] . 

a6 

Now let U = e and take logs. This yields 

log p(x) = a + i(xj-yj)3j + log E[u''+~y+lX=yl 

where 

a = log p(y), and 3j = -ab j . 

Then set 

= log E[U^'">''+|X=y]. QED. 
Cressie and Holland show that the total number of nonredundant parameters 
in (33) is 2J-1 — there a J 3's and J-1 T's. However, the Y's are not freely 
varying parameters and are subject to a system of inequalities. While these 
inequalities do not restrict the T's in a functional way, they do have an 
interesting impact on the values that the T's can take on as the next corollary 
shows . 

Corollary 2 : If p(x) satisfies the hypothesis of Theorem 2 and ^f y is such 
that F(e|X=y) is N(yy, ay) then 

= a yy(k-y+) + i(a ay)^k-y+)^ (35) 
so that the lie on a quadratic curve as a function of k. 

Proof : Simply evaluate E[e^ ^(^-^ y+)jx=y] from the proof of Theorem 2 using the 
moment generating function of an univariate normal distribution. QED . 

We observe that the existence of a y for which 6|X=y has a nearly normal 
distribution follows from the assumption that J is large and F(6) is smooth. 
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Kence, rather than 2J-1 parameters, when J is large, the Rasch model can be 

expected to behave as though there were only J+1 parameters — J )5*s and one T 

2 

(the coefficient of (x+-y+) ). 
4. DISCUSSION 

In my opinion, the Dutch Identity has been shown to be a useful tool for 
the analysis of IRT models. The very fact it implies that long tp.sts must 
exhibit very little third and higher-order interactions in their manifest proba- 
bilities, {p(x)], is remarkable and not well-known. I have begun Monte Carlo 
simulation work to investigate how large J must be ii: order for SOE models to 
fit data generated by a model of the form (1). For ten items with Rasch IRFs 
the fit based on 30,000 simulated examinees is quite good — likelihood ratio 
chi-squares of 965 on 968 degrees of freedom. For non-Rasch IRF* (either linear 
logit functions with different slopes or 3PL IRFs) the fit on ten items is not 
as good. These results are in agreement with the theory in this paper, but 
there is clearly more work to he done. 

A second remarkable fact that the Dutch Identity implies concerns the 
number of parameters that can be estimated in a long test. The discussion in 
section 3.2 shows that all "smooth" unidimensional IRT models converge to a 
model of the form (19) as the number of items grows. The model in (19) has only 
two parameters per item which may be interpreted as the value of Xj and of its 
first 6-derivative at a single point. Hence models that attempt to fit three or 
more parameters per item can only do so successfully for two reasons; either (1) 
they are not applied to a large enough item set or (2) the test is not unidimen- 
sional. What I conjecture from this analysis is that there is a sort of 
"conservation law" for IRT item parameters of the form: a D-dimensional set of 
J items can only support a total of (D+1)J parameters when J is large. 



Individual items may be able to have more than D+1 parameters estimated for 
them, but only at the expense of fewer estimable parameters for some other 
items. The total cannot exceed (D+1) J. It will be very useful to see how this 
type of result actually works on real and simulated data. 
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